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Abstract 

The aim of this article is to simphfy Pfanzagl's proof of consistency for 
asymptotic maximum Hkehhood estimators, and to extend it to more general 
asymptotic M-estimators. The method relies on the existence of a sort of 
contraction of the parameter space which admits the true parameter as a 
fixed point. The proofs are short and elementary. 



1 Introduction 

After the seminal work^ of Fisher, the asymptotic properties of maximum likelihood 
estimators, and in particular their consistency, were studied by various authors, in- 
cluding Doob |Doo34j . Cramer |(yra46j . and Huzurbazar |Huz48j . Nowadays, one of 
the best known result regarding consistency goes back to Wald, who gave in |Wal49j 
a short and elegant proof of strong consistency of parametric maximum likelihood 
estimators. Since that time, several authors studied various versions of such con- 
sistency problems, including among others, Le Cam |LC53j . Kiefer and Wolfowitz 
|KW56j . Bahadur Rahfiri lRaFITl] . Huber |Hub67j . Perlman |Per72j . Wang |Wan85j . 
and Pfanzagl [Pfa88. .Pfaionj . 

Wald's original proof relies roughly on local compactness of the parameter space, 
on continuity and coercivity^ of the log-likelihood, on the law of large numbers, and 
last but not least on local uniform integrability of the log-likelihood. It does not re- 
quire differentiability, and makes extensive use of likelihood ratios. The integrability 
assumption has been weakened by many authors, including for instance Kiefer and 
Wolfowitz in |KW56j and Perlman in |Per72j . see also |Bah71j . One can find a mod- 
ern presentation of Wald's method for M-estimators in van der Vaart's monograph 
|vdV98j . 

Pfanzagl gave in |Pfa88t IPfa90j a proof of strong consistency of asymptotic max- 
imum likelihood estimators for nonparametric "concave models" with respect to the 

^The interested reader may find a quite recent account in |Ald97| and references therein. 
^By coercivity we mean that the log-likelihood tends to — oo when the parameter tends to oo. 
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estimated parameter, including nonparametric mixtures. His approach relies in par- 
ticular on a simplification of an earlier work of Wang in |Wan85j based on uniform 
local bound of the likelihood ratio. 

The present work was initially motivated by the inverse problems considered in 
|(]Ln6j . Our aim is to simplify Pfanzagl's approach, and to extend the framework 
from asymptotic maximum likelihood to more general asymptotic M-estimators. In 
particular, log-likelihood ratios are replaced by contrast differences. The hypotheses 
appearing in our main Theorem are unnecessarily strong. However, they allow a 
simple and short presentation. We emphasize the role played by a sort of contraction 
map a* defined on the parameter space. We do not assume any coercivity of the 
contrast as in |Wal49j . However, we require the compactness of the space of the 
estimated parameter, as in |KW56j and |vdV98j for example. This compactness 
comes usually for free in the case of fully nonparametric models. We do not make 
use of any Uniform Law of Large Numbers. Our method does not belong to the 
Glivenko-Cantelli approaches of consistency, as in |Dud98j , |FioOOj , |AK94j , |vdV98j 
and |vdG03| IvdOOOj and references therein. 

Let G be a separable Hausdorff topological space with countable base. Let 
{Pe)e&e be a known family of Borel measures on a measurable space X. Let ^* G O 
be some unknown point of G such that P* := Pq* is a probability measure. Let 
(X„)„gN be an i.i.d. sequence of observed random variables defined on a probability 
space P) and taking their values in X, with common law P* . Let {9n)nm be 

a sequence of random variables defined on (fi,jF, P), taking their values in G, and 
such that {9n)n&i is J?-'„-measurable for any n G N, where JF„ := (t(Xo, . . . , X„). We 
say that {9n)nm is strongly consistent if and only if 

P- a.s. lim On = 0*. (1) 

?i— >+oo 

We use in the sequel the abbreviations "a.s." for almost sure, "a.a." for almost all, 
and "a.e." for almost everywhere. Let Q x X 3 {6,x) ^ m{6,x) G M be a known 
function such that mg := m{6, ■) is measurable for any G G. For any n, we define 
the random function M„ : G — > M by 



1 " 

M„(^):=-^m(^,X, 



n 
1=1 

This can be written also Mn{0) = Fnmg where P„ := + ■ ■ ■ + Sx„) is the 

empirical measure. We say that (6'„)„ is a sequence of asymptotic M-estimators if 
and only if 

P-a.s. (supM„ - M„(^„) ) = 0. (2) 

The term asymptotic is used for the same notion (with the likelihood) by Pfanzagl 
in |Pfa88j . In the literature, some authors, including Wald and Perlman, use the 
term approximate rather than asymptotic. However, the term approximate has been 
used by Bahadur in a different sense in [BahTl^ page 34]. 
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For example, if for large enough n, there exists an jF„-measurable 9n in such 
that Mnifin) = supqM^, then such a random sequence {6n)n&i fulfils (j21). 

For any probability measure P on X, let L^(A',P) (resp. L^(A',P)) be the 
set of random variables Z : X such that := max(+Z, 0) (resp. Z~ : = 

max(-Z, 0)) is in \^{X, P). On E{X, P) := h\{X, P) U h\{X_^P), the expectation 
P{Z) = P{Z^) — P{Z^) makes sense and takes its values in M := M U {±oo}. For 
any 6 eQ such that G E{X, P*), we define the contrast M*{6) G M by 

M*{e) ■= P*me. (3) 

In the sequel, we say that the model is identifiable when for any 9 E the condition 
Pe = P* implies that = 9*. 

Example 1.1 (Log-Likelihood). Assume that for some fixed Borel measure Q on 
X, one has Pe <^ Q for any 9 E Q. Let fe := dPe/dQ and assume that > 
on X for any 9 E Q. Define m{9,x) := \og{fo{x)). Then M„ : 9 — > M zs the log- 
likelihood random functional given by Mn{9) = Pn^e = IPnlog(/e)- We will speak 
about sequences of "asymptotic maximum likelihood estimators" . The log-likelihood 
ratio is 

Mn{9i) - Mn{92) = P„l0g(/e,//ej. 

As usual for the log-likelihood, when M*{9*) is finite, one can write for any 9 

M*{9) - M*{9*) = -Ent(P0. | Pe) , 

where Ent{Pe-^ I -^6*2) ^■5 the Kullback-Leibler relative entropy of Pe^ with respect to 
Pe^- In particular, M*{9) < M*{9*) with equality if and only if Pe = Pe*, which 
implies 9 = 9* if the model is identifiable. Notice that when Q is the Lebesgue 
measure on X = M", then —M*{9*) = — f0*{x)\og{fe*{x)) dx is the Shannon 
entropy of fe* . 

Example 1.2 (Beyond the log-likelihood). Assume that for some fixed Borel 
measure Q on X , one has Pe <^ Q for any 9 E Q, with Pe{X) < 1 and fe := dPe/dQ. 
Let $, \& : (0, -|-cxd) — > M &e two smooth functions. Assume that "^{fe) E L^{X,Q) 
for any 9 E Q. Define me by 

me = ^fe)- [ ^{fe) dQ + Pe{X). 
Jx 

This gives rise the the following empirical contrast 

Mn{9)=¥n{^{fe))- [ ^{fe)dQ + Pe{X). 

Jx 

In particular, if 9 E Q is such that ^{fe) E L^{X,P*) where here again P* := Pe*, 

M*{9) = P*mfe))- [ ^ife)dQ + Pe{X). 

Jx 
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Assume now that u ^ u^'{u) is locally integrable on M_|_, and consider the case 
where is the ^-transform given for any u G (0, +00) hy 



\^(u) = / v^'(v) dv. 







For $ : M log(M), one has \I/ : n 1-^ u and we recover the log-likelihood contrast 

M*{e) = p*(iog(/,)). 

For $ : M I— > M, one has \1/ : m h-* in^, and we get the quadratic contrast 

M*{e) = - fe4Ux,Q) + W'o4l(x,Q) + Pe{X). 

In both cases, the map 9 M*{9) admits 9* as unique maximum provided that the 
model is identifiable. More generally, define the ^-transform G : (0, +00Y —^M.by 

Q{u,v) : = u^{v) - 

= ) - / w^'{w)dw. 
Jo 

When 9 and 9* are such that both Q^fe*, fe*) and Q(fe*, fe) belong to L^(X, Q), 

M*{9)= [mfe'Je)-<S>ife'Je^))dQ+ [ Q{fe* Je-) dQ + Pe{X). 
Jx Jx 

Notice that B is linear in $. One can consider useful examples for which the function 
$ is bounded, in such a way that mg is bounded for any 9 E Q. For instance, let us 
examine the case where $ : n i-^ —(1 + u)~'^ . Then, \1/ : n 1-^ — + u)~'^ , and the 
map 9 (— > M*{9) admits 9* as unique maximum, provided identifiability holds, since 
for any {u, v) G 



Q{u,v) = —— — and Q{u,v) — Q{u,u) 



1+vy ' ' ' ' ' ' {l + u){l + vy' 

The function \1/ is additionally bounded here. The similar case $ : n i-^ — (1 + 
is also quite interesting. Notice that 0(m, ■) is concave on (0, +00) as soon as $ is 
concave, non decreasing, with ^'{v) + > for any v > 0. Observe that this 

is not the approach of Pfanzagl in \Pfa90li , which is more related to the log-likelihood 
ratio. Notice that in the case of the log-likelihood, one has ^ : u ^ log('u), which 
gives \1/ : -u ^-^ -u and : (u,v) 1— —u\og(y) — v, and thus Q{u,v) — Q(u,u) = 
u\og{u/v) -\-u — v. It might be possible to extensively study such "^-estimators" , in 
the spirit of the "^-calculus" developed in \Cha04\ \Cha06^ . This is however outside 
the scope of this short article. 

One can notice that the observation of Lindsay in |Lin88a| ILin88bj regarding the 
nature of maximum hkehhood for nonparametric mixture models remains vahd for 
more general models provided that m is concave. 
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2 Main result and Corollaries 



With the settings given in the Introduction, the following Theorem holds. 
Theorem 2.1. Assume that is compact and that the following assumptions hold. 
(Al) For P* -a. a. x E X, the map m{-,x) is continuous on B; 

(A 2) There exists a continuous map a* : ^ which may depend on 6* such 
that for any 6^9*, there exists a neighborhood V G Q of 6 for which 
supy (m - nia*) e L\{X, P*) and P*{mg - ma*(e)) < 0. 

Then any sequence {6n)n of asymptotic M-estimators is strongly consistent. 

Proof. Postponed to section 0] □ 

The quantity P*{me — ma*{e)) in (Al2|l has a meaning in M since the first part 
of (A|2I) ensures that me — rna*(8) G L^{X,P*). Moreover, P*{m0 — ma*(e)) reads 
M*{e) - M*{a*{9)) when the couple (me, ma^^e)) is in Ll{X, P*) x Ll{X, P*) or in 
L\{X,P*) X Ll{X,P*). 

Since 6* is unknown in practice, each assumption in Theorem 12 . II must hold for 
any 6'* G such that Pq* is a probability measure, in order to make the result useful. 

Remark 2.2 (Assumptions). The first part of (J^^ is in a way an M-estimator 
version of the integrability condition considered by Kiefer and Wolfowitz for the log- 
likelihood in \KW5d^ . The assumptions (-^H]) and (M^) required by Theorem \2. 1\ 
can be weakened. However, they permit a streamlined presentation. In particular, 
only lower semi- continuity is needed in (J^). see for instance {PfaSdj . Additionally, 
and following for example \Per7^ page 266], the uniform integrability assumption 
can be weakened, by considering blocks of k > 1 observations instead of one 
observation, see also \vd V981 comments following Theorem 5.14]. 

As stated in the following Corollary, Theorem 12.11 implies a version of Wald 
consistency Theorem for asymptotic M-estimators, see |,Wal49] . |Per72| Section 2 
page 269], and |vdV98l Theorem 5.14]. 

Corollary 2.3 (Perlman-Wald). Assume that Q is compact, and that for P* -a. a. 
X E X, the map m{-,x) is continuous on 0. Assume that for any 9 in 0, there 
exists a neighborhood V such that sup^m G \J'{X^P*). Assume in addition that 
M* achieves its supremum over at 9* , and only at 9*. Then, any sequence of 
asymptotic M-estimators is strongly consistent. 

Proof. One has mg G l-i^{X, P*) for any 9 in 0, and thus M* : — > M is well defined. 
Moreover, (A |2j) holds with a constant map a* = 9* . Namely, for any 9^9*, one 
has on one hand P*{mg — mg*) < since M*{9) < M*{9*), and on the other hand 

sup(m — ma*) = —mg* + supm G L^(A', P*). 

V V 

□ 
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As stated in the following Corollary, Theorem 12.11 implies the main result of 
Pfanzagl in |Pfa88j for concave models, itself based on an earlier result of Wang in 
|Wan85 j. This is typically the case for mixtures models, for which G is a convex set 
of probability measures on some measurable space, cf. sectional 

Corollary 2.4 (Pfanzagl- Wang). Let Q be a reference Borel measure on X . Con- 
sider the case where Q is a convex compact subset of a linear space such that for any 
9 E Q, Pe{'^) < 1 (ind Pg <^ Q with fg := dPg/dQ > on X . Suppose that Q-a.e. 
on X , the map 9 ^ fe{x) is concave and continuous on 0. Assume that the model 
is identifiable. Consider mg := log{fg) and the related log-likelihood Mn- Then any 
sequence of asymptotic log-likelihood estimators is strongly consistent. 

Proof. First of all, we notice that it is not possible to take a* = 9* since we cannot 
ensure that the condition mg* — nig = \og{ f g* / fg) G L]_{X,P*) of (A|2I) is true. 
However, the concavity of the model allows to take a map a* which is a strict 
contraction around 9*. Namely, for an arbitrary A G (0, 1), let us take 

a*{9) := X9* + (1 - A)^. 

The concavity of the model yields 

ma*{g) -mg = log ( ^ — '- j > log I 1 > log(l - A). 

Now, we have log(l - A) G L^iX, P*) since A < 1. Define the function $ : M+ ^ M 
by := ulog(Au + (1 — A)). The concavity of the model yields 

P*r /"f 1 / A/g, + (l-A)/g ^ f^ff^'\f,n 
P {ma*(e) -mg)> j fg* log I — \ dQ = J ^{-j^ jfedQ. 

Let us show that the right hand side of the inequality above is strictly positive when 
9 ^ 9*. One has Pg{X) > since fg > 0. Define \E'(u) := Jensen's 
inequality for the probability measure Pg{X)^^Pg and the convex function $ yields 

J^^(^^yedQ>^{Pg{X)). (4) 

It is enough to show that either (j3)) is strict or the right hand side of is strictly 
positive. Since A > 0, the function $ is strictly convex. Thus equality holds in 
(0} if and only if Pg{fg* = afg) = 1 for some a G M+. The only admissible case is 
a = Pg{X)^^ > 1 since Pg*{X) = 1 and since identifiability forbids Pg{fg* = fg) = 1. 
Therefore, if Pg{X) = 1, inequality ^ is necessarily strict. On the other hand, 
\E'(1) = and "^{u) > when u < 1. Thus the right hand side of is always 
non negative, and is strictly positive as soon as Pg{X) < 1. We conclude that 
P*{T^a*{e) - mg) > as soon as 6^ 7^ 9*. This shows that (AH)) holds with V = Q, 
and the proof is thus complete. □ 
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Remark 2.5 (About the map a*). Let a* : Q ^ Q be a map which satisfies the 
condition P*{mg — ma*{e)) < for any 9 9* of (JS^). Then, the impossibility of 
P*{mQ — nig) < for any 9 yields that 

• a* {9) 7^ 9 for any 9^9*. In particular, 

— the map a* cannot be the identity map ; 

— if a* is constant, then a* = 9* ; 

— the point 9* is the only possible fixed point for a* . 

The proof of Corollarv \2.'J^ gives an example where a* = 9* works and fulfills (J^^). 
In contrast. Corollary \2.4\ provides a situation where a constant a* does not fulfill 
(J^^). However, we have shown in the proof of Corollary \2.4\ that an a* map which is 
a strict contraction around 9* fulfills (J^^. Actually, when O has the structure of a 
convex subset of a vector space, any strict contraction around 9* fulfills the properties 
of a* listed above. The existence of a fixed point can be related to Brouwer-like fixed 
point Theorems. For instance, any continuous mapping of a non-empty compact 
convex subset ofM.'^ into itself contains at least one fixed point. Consequently, when 
Q is a non-empty compact and convex subset o/M"^, any continuous a* map admits 
9* as a unique fixed point. There exists numerous dimension free Brouwer-like fixed 
points theorems, due to Schauder, Tikhonov, Kakutani, . . . , see for instance \Zei80^ 



Remark 2.6 (Infinite values of m). Theorem \2. 1\ does not allow m to take the 
value — oo. This limitation is due to the fact that differences of the form rrig — nig/ 
do not make sense if m is allowed to take the value —oo. The consistency proof of 
Wald does not suffer from such a limitation since it does not rely on m differences, 
but it requires however strong uniform integrability assumptions. A careful reading 
of the proof of Theorem \2.1\ shows that only differences of the form rrig — ma*{g) are 
involved. On the other hand, according to Remark \2.^ a* (9) ^ 9 for any 9^9*. 
Consequently, one may allow, in Theorem \2.1\ the map m{9, x) to take the value 
—oo for at most one value of 9. For the log-likelihood, mg = log(/e) and one has 
mg{x) = —oo if and only if fe{x) = 0. One may allow fg = for at most one value 
of 9 in Corollary \2.Ji\ 

Remark 2.7. Let ^ G O such that mg G E{X, P*). Then, the law of large numbers 
applies and gives that P*-a.s., lim„ M„(^) = M*{9) G M, and the a.s. subset of 
X may depend on 9. In particular Mn{9) = M*{9) + op(l). For a sequence {9n)n 
satisfying one can write for any 9 & Q with finite Mn{9) 



and JHMBI- 



Mn{9n) = Mn{9r,) - Mn{9) + M„(^) 




= op{l) + M{9) 

where the last step follows by ^ and the law of large numbers. 
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3 Log-Likelihood and mixtures models 



For any topological space Z equipped with its Borel cr-field, we denote by 
the set of probability measures on Z, and by Ch{Z^ the set of bounded real valued 
continuous functions on Z. The Prohorov topology on A^i(2) is defined as follows: 
6'„ — > 6' in A^i(2) if and only if J^f d6n — > Jzf f*^^ / ^ Cb{Z). It is known 
that a subset of Aii{Z) is compact if and only if it is tight. As a consequence, 
A4i{Z) is not compact in general. Following |Pfa88l section 5 page 149], the set 
sub-probabilities provides a compactification which allows the following consistency 
result for asymptotic log-likelihood estimators of nonparametric mixture models. 

Corollary 3.1 (Pfanzagl). Let Z be a locally compact Hausdorff topological space 
with countable base. Let Q be a measure on a measurable space X . Let k : X x Z 
(0, +oo) be such that Jk{x, z)dQ{x) = 1 for any z & Z and k{x, ■) G Cb{Z) for any 
X E X. Let 6 := J^i{Z) and consider the family {Pe)e(^e of probability measures 
on X defined by dPe = fedQ with fe{x) := fk{x,z)d9{z). Assume that the model 
is identifiable. Let m : Q x X —> M. be the map defined by m{9,x) := logfe^x), and 
Mn be the corresponding log-likelihood. Then any sequence of asymptotic maximum 
likelihood estimators is strongly consistent for the Prohorov topology. 

Proof. As explained above, G = Aii{Z) is not compact for the Prohorov topology, 
and one must consider a suitable compactification, as in jBahTlj for instance. Let 
Co{Z) be the set of real valued continuous functions on Z which vanish at infinity. 
Let G be the set of Borel measures 6 on Z such that 6{Z) < 1 (i.e. sub-probabilities), 
equipped with the vague topology related to Co{Z). Namely, ^„ — 6* in G if and 
only if f^f d9n fzf / ^ ^o{^)- The injection G C G is continuous; G 

is a compact metrizable topological space, and thus has a countable base. Moreover, 
G is convex, and for any G G, there exists G G and a G [0, 1] such that 6 = ad' . 

We extend the set of probability measures (P6i)0ge on ^ to the set of sub- 
probability measures {Pb)^^^ on X , where dPo = fgdQ and fe{x) := Jk{x, z) d9{z). 
One has by virtue of Fubini-Tonelli Theorem that Pe{X) = 0{Z), and thus Pg G 
Mi{X) if and only if 6* G G := Mi{Z). Notice that 9* is taken in G. 

Let 6* G G such that Pg = Pg*. Since 9* is taken in G, one has that Pg G M.i{X), 
therefore ^ G G and thus ^ = ^* by identifiability in G. Notice that G is the convex 
envelope of G U {0}. The set G contains the null measure 0, for which fo = and 
thus rriQ = — oo. If 6* G G with 6* 7^ 0, then fg > on X since k > 0, and thus 
mg{x) := log fg{x) is finite for any x E X. For any x E X, the map ^ G G 1-^ nig{x) 
is continuous since k{x,-) is in Cq{Z). 

For any ^ G G with ^ 7^ 0, one can write 9 = a9' with 6'' G G and a := 9{Z) G 
[0, 1]. One has then fg = afgi and thus mg = log a + mgi. Therefore, 

Mn{9) = loga + Mn{9') < Mn{9'). 

As a consequence, supgg@M„(6') = supgg@ Af„(6'), and one may substitute G by G 
in the definition (j^J. Now, let {9n)nm be a sequence in G of asymptotic maximum 
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likelihood estimators. Corollary 12 .41 and Remark |2.6l for {Pe)0^0 apply and give the 
P*-a.s. convergence for the vague topology of {6n)nen towards 6*. Since both the 
sequence and the limit are in 0, the convergence holds for the Prohorov topology, 
and the desired result is established. □ 

Remark 3.2. A mixture model can always be seen as a conditional model. The 
observed random variables X with values in X is the first component of the couple 
{X,Z) with values in X x Z. The component Z is not observed. However, the 
conditional law C{X \ Z = z) is known, and has density k{-, z) with respect to Q on 
X. If 9 = C{Z), then C{X) has density fg with respect to Q on X. 



4 Proof of main result 

Lemma 4.1 (Reformulation). The random sequence (^„.)n is a sequence of asymp- 
totic M- estimators if and only if 

F-a.s., V(ft„)„Ge^, (M^{e^)-Mn{en))<o. (5) 

n— >+oo V / 

Proof. The proof is done "a; by u" , and the a.s. sets in (j21) and © are the same. 
Recall that {6n)n is a sequence of asymptotic M-estimators if and only if (j2)) holds. 
Actually, the definition of the supremum gives supgg0 M„(6') — M„(6'„) > 0. There- 
fore, (121) is equivalent to 

P-a.s., ]hR ( supM„(e) - M„(^„) ) < 0. (6) 

The Lemma is thus reduced to the equivalence between ^ and 0. We begin by 
the proof of the implication © ^ ©• Let A be some P-a.s. set such that (jH)) holds. 
We proceed by fixing u & A. We hide the dependency on uj in the notation of M„ 
and 9n to lightweight the expressions. Let (6'„)„ be a sequence in 9. By definition 
of the supremum, we have M„(6'„) < supggg M„(6'). Thus, we get 

MM - MniOn) < supM„(^) - MM- 
eee 

Taking the lim„_,+oo of both sides and using (jH)) provides the expected result It 
remains to establish the implication © ^ ©• Let A be some P-a.s. set such that 
(0) holds. Here again, we proceed by fixing u E A, and we hide the dependency 
on UJ in the notation of the random objects like M„ and 6n- By definition of the 
supremum, there exists, for any n, an element 6'„ G O such that 

sup Mn{9) - MM --< 0. 
eee n 
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Notice that 6'„ depends on u since M„ depends on uj. This yields 



lim supM„(e) - M„(e„) < 0. (7) 



Now we write the telescopic sum 



sup M^{e) - MM = supM„(^) - MM + MM - MM, 



which gives 



lim supM„(^) -M„(^„; 



n— »+oo 



6*66 



< lim (supM„(^) -M„(^„) ) + lim (mM) - MM 

The two terms of the right hand side are "< 0" by virtue of ((Zj) and © respectively. 
This provides the desired result ©, as expected. □ 

Lemma 4.2 (Separation). Assume that F-a.s., for any neighborhood U of 6* , for 
any sequence {6n)n in U'^ , there exists a sequence {0'^)n in 9 such that 

lim {MM - Mn{en)) > 0. (8) 

n— >+oo 

Then, any asymptotic M-estimators sequence {6n)n is strongly consistent. 

Proof. Suppose that (jHl) holds for some a.s. set A, and that {6n)n is a sequence of 
asymptotic M-estimators which is not strongly consistent. Saying that (6'„)„ is not 
strongly consistent means that for any P-a.s. set, there exists a neighborhood U 
of 9* and a subsequence {On^)k in U^. In particular, on the a.s. set A, this gives a 
neighborhood U of 9* and a subsequence {On^)k in U'^. Now, by virtue of (jH)), 

P-a.s, e lim (m^M,) " M^M)) > 0, 

where the a.s. set is A. This contradicts © which holds P-a.s. too. □ 

Lemma 4.3 (The a* map). Assume that is compact and that there exists a map 
a* : G ^ B such that for any 9 ^ 9* , there exists a neighborhood Ue of 9 such that 

P-a.s., lim inf (M„(a*) - M„) > 0. (9) 
Then, any asymptotic M-estimators sequence {9n)n is strongly consistent. 



10 



Proof. Let us show that the assumptions of Lemma l^^ are fulfilled. We will establish 
(jSI) for an a.s. set A which does not depend on the neighborhood U of 6*. Namely, 
let U be an open neighborhood of 9*. For any 9 G If^, let Ug and Ag be the 
neighborhood of 9 and the P-a.s. set for which Q holds. Notice that Ag depends 
on Ug. The set If^ C Ug^w^Ug is compact as a closed subset of the compact set 9. 
We can thus extract a finite sub-covering If^ C U'^^iUg^, and write 

liminf (M„(a*) - M„) > lim min inf (M„(a*) - M„) 

= min liminf (M„(a*) - M„). 

l<i<k n Ue- 

By virtue of we get from the above that 

P-a.s., liminf (M„(a*) - M„) > 0, (10) 

where the P-a.s. set is Ajj := flf^^Ag.. Recall that U was a freely chosen neighbor- 
hood of 9*. Consider now a countable base {Uk)k foi 9*. Then fllO|) holds on the 
P-a.s. set A := n°ZiAuf,, which does not depend on U. Notice at this step that 



Mn{a*{9n)) - M„i9n) > inf (M„(a*) - M„) 

as soon as 9n G If^ by definition of the infimum. This gives (jH)) from (fTUI) on the 
P-a.s. set A defined above, with (6'^)„gN = (a*(6'„))„gN- n 

Proof of Theorem \2.1\ The desired result follows from Lemma I4.HI Namely, let us 
show that (jni) is a consequence of (A|H) and (A|2I). Let 9^9* and let a* and V as in 
(AlH). Let Vfc \ {^} be a decreasing local base with Vq dV . Let Z := inf y(ma* —m) 
and := inf (ma* — m) and := rna*[g) — mg. By (A |T|) and the continuity of 
a* and the separability of G, we get that : A" ^ M is measurable, and that 

P*-a.s., Z <Zk/ Z^. 

Now, by (AUD, we get that Z G LL(A',P*) and Zoo e L^A',P*) and P*(Zoo) > 0. 
Observe that Z > —Z^ G L^(A', P*). Thus, by the monotone convergence Theorem, 

\imP*{Zk)=P*iZ^)>0. 

k 

Therefore, P*{Zk) > for some k (actually for k large enough). Let us denote 
Ug := Vfc. Now, by the law of large numbers 

P-a.s., limP„( inf (ma* — m) I = P* I inf (ma* — m) I > 0. 
n \Ue I \Ue I 



This gives finally © since for any 



n 



inf(M„(a*) - M„) = inf P„(ma* - m) > P„ inf(ma* - m] 

Ue Ue \ Ug 



□ 
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